JIT: Fix `emitter::emitSplit` to not create zero-sized method fragment #107568

amanasifkhalid · 2024-09-09T20:24:43Z

Fixes #106379 (comment). When considering the final split of a method body, skip the split if it will create a zero-sized fragment at the end of the method. Under stress modes, it's fine to ignore the desired fragment size, but in the rare case where we need to split the last piece of the method and it is exactly 512KB, skipping the split might break ISA invariants around how EH data is reported. This seems terribly unlikely, but I documented this just in case we hit it some day.

amanasifkhalid · 2024-09-09T22:19:47Z

cc @dotnet/jit-contrib, @BruceForstall could you PTAL? Thanks!

BruceForstall · 2024-09-10T20:51:28Z

I'll look soon.

However, regarding "This seems terribly unlikely" -- that is not a satisfying reason to not ensure this works 100% of the time. I've worked on a lot of software where "terribly unlikely" == "guaranteed to happen".

amanasifkhalid · 2024-09-10T21:27:46Z

However, regarding "This seems terribly unlikely" -- that is not a satisfying reason to not ensure this works 100% of the time. I've worked on a lot of software where "terribly unlikely" == "guaranteed to happen".

Agreed; you inspired me to do a quick audit of the coreclr_tests collection, and we do have a handful of methods that have to be split up on ARM32/ARM64/RISC-V/etc. to report their EH info correctly (the largest IL size comes in at 7,557,059 bytes), though none of them happened to be susceptible to creating a zero-size fragment. Instead of trying to add padding this late in the emitter, I think we can instead shorten the candidate size by the size of an instruction if it is equal to the remaining method size. This way, the last fragment will have a non-zero size, and all the fragments will be guaranteed to fit in the unwind data.

Edit: I thought more about this, and doing something like moving the last instruction in the last candidate IG to a new IG to create a non-zero fragment size seems nontrivial. Between the two, it seems safer to give the next IG a non-zero size with padding, though before we pursue that, it would be helpful to verify if having unwind data corresponding to a zero-size code fragment is a problem on platforms that split fragments.

BruceForstall · 2024-09-12T23:21:48Z

@amanasifkhalid I'm still trying to understand how the problem could occur. It seems like we are processing IGs in sequence, and if we collect enough that requires splitting, we will do so before reaching the last one. If we reach the last one, and call splitIfNecessary() outside the loop, it will only split if we need a potential split between the last IG in the list and its previous IG. That would imply that the last IG has non-zero size, otherwise we would have split before it.

Where does the zero sized IG come from?

Do you have a SPMI MCH I can use to repro?

amanasifkhalid · 2024-09-13T15:47:35Z

Where does the zero sized IG come from?

The zero-sized IG seems to be specific to JitStress. From the JIT dump:

G_M11858_IG08:        ; offs=0x000098, size=0x0000, bbWeight=1, gcVars=0000000000000008 {V01}, gcrefRegs=0000 {}, byrefRegs=0000 {}, BB05 [0008], gcvars, byref, isz
                             ; byrRegs -[x3]


*** JitStress: STRESS_EMITTER ***

IN001b: 000098      align   [0 bytes for IG09]
IN001c: 000098      align   [0 bytes]
IN001d: 000098      align   [0 bytes]
IN001e: 000098      align   [0 bytes]
						;; size=0 bbWeight=1 PerfScore 0.00
G_M11858_IG09:        ; offs=0x000098, size=0x000C, bbWeight=9.00, gcVars=0000000000000009 {V00 V01}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref, funclet prolog, nogc

This seems to match the scenario described in this comment in emitter::emitSplit:

// Don't report a zero-size candidate. This will only occur in a stress mode with JitSplitFunctionSize
// set to something small, and a zero-sized IG (possibly inserted for use by the alignment code). Normally,
// the split size will be much larger than the maximum size of an instruction group. The invariant we want
// to maintain is that each fragment contains a non-zero amount of code.
if (candidateSize == 0)

So I'm guessing this scenario is only possible under stress modes? We shouldn't normally see zero-sized IGs, right?

Do you have a SPMI MCH I can use to repro?

Sure. I'm not getting a good collection using an AltJit, so I'm going to rebuild my arm64 runtime; just a minute...

amanasifkhalid · 2024-09-13T17:31:10Z

@BruceForstall if you have an arm64 machine to try this on, Method4 hits asserts with DOTNET_JitStress=1:

using System;
using System.Collections.Generic;
using System.Runtime.CompilerServices;
using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.Arm;
using System.Runtime.Intrinsics.X86;
using System.Numerics;
public class TestClass
{
    public struct S1
    {
        public struct S1_D1_F1
        {
            public int int_0;
        }
        public sbyte sbyte_1;
    }
    public struct S2
    {
        public Vector<uint> v_uint_2;
    }
    static Vector64<short> s_v64_short_20 = Vector64.Create(((short)(4)), -5, 4, 2);
    static Vector64<int> s_v64_int_22 = Vector64<int>.AllBitsSet;
    static Vector64<long> s_v64_long_24 = Vector64<long>.AllBitsSet;
    static Vector128<int> s_v128_int_32 = Vector128.CreateScalar(((int)(2)));
    static S1 s_s1_52 = new S1();
    static S2 s_s2_53 = new S2();
    Vector64<short> v64_short_71 = Vector64.Create(((short)(4)), -2, -1, 4);
    Vector64<long> v64_long_75 = Vector64<long>.Zero;
    Vector128<int> v128_int_83 = Vector128.Create(((int)(0)), 2, 4, -1);
    S1 s1_103 = new S1();
    S2 s2_104 = new S2();
    static int s_loopInvariant = 8;
    public S2 Method4(out S1 p_s1_198, S2 p_s2_199, ref S2 p_s2_200, out S1 p_s1_201, ref S1 p_s1_202, S2 p_s2_203, S2 p_s2_204, out S1 p_s1_205)
    {
        unchecked
        {
            p_s1_198 = s1_103;
            p_s1_201 = p_s1_198;
            p_s1_205 = s1_103;
            try
            {
                s_v64_long_24 = ((Vector64<long>)(((Vector64<long>)(s_v64_long_24 = ((Vector64<long>)(v64_long_75 = v64_long_75)))) | s_v64_long_24));
            }
            finally
            {
                int __loopvar0 = s_loopInvariant, __loopSecondaryVar0_0 = s_loopInvariant - 9;
                for (;;)
                {
                    if (__loopvar0 <= s_loopInvariant + 9)
                        break;
                    try
                    {
                        v128_int_83 -= ((Vector128<int>)(((Vector128<int>)(((Vector128<int>)(s_v128_int_32 -= v128_int_83)) | ((Vector128<int>)(s_v128_int_32 | Vector128.WithUpper(s_v128_int_32, s_v64_int_22))))) | v128_int_83));
                    }
                    finally
                    {
                        v64_short_71 *= ((Vector64<short>)(s_v64_short_20 *= ((Vector64<short>)(Vector64.LessThanOrEqual(v64_short_71, s_v64_short_20) - ((Vector64<short>)(((Vector64<short>)(s_v64_short_20 * s_v64_short_20)) - ((Vector64<short>)(Vector64<short>.AllBitsSet * v64_short_71))))))));
                    }
                }
            }
            return s_s2_53;
        }
    }
    public void Method0()
    {
        unchecked
        {
            S1 s1_2942 = new S1();
            S2 s2_2943 = new S2();
            s_s2_53 = Method4(out s1_103, s_s2_53, ref s2_104, out s_s1_52, ref s1_103, s2_2943, s2_104, out s1_2942);
        }
    }
    public static void Main(string[] args)
    {
        new TestClass().Method0();
    }
}

BruceForstall

I wonder if some more verbose comments about the possible scenario, as described here would be helpful.

E.g., why is this only a problem for the last call to splitIfNecessary and not the ones in between? I think that's because any zero-sized IG that aren't the last will get "absorbed" into following non-zero-sized IGs.

BruceForstall · 2024-09-19T01:01:29Z

src/coreclr/jit/emit.cpp

+        // TODO: This scenario may be possible (but very unlikely) when using the default fragment size.
+        // Fixing this would likely require padding out the last fragment to not be zero-sized, and doing the split.
+        // For now, assume this only occurs under stress modes, or when using DOTNET_JitSplitFunctionSize.
+        assert(maxSplitSize != UW_MAX_FRAGMENT_SIZE_BYTES);


I don't think this comment is quite right. Yes, it is unlikely. But we're talking about a single empty fragment at the end. If there was a non-empty fragment, then curSize != candidateSize. So it seems like the fix works for any split size, including the default, and doesn't require this assert or TODO. (It might require other commenting/documentation, but that's a separate question.)

But we're talking about a single empty fragment at the end.

Yeah, on second thought, I think you're right. If we have a 512KB-sized IG, followed by an empty IG that happens to be zero-sized, then the last fragment should be exactly 512KB in size, which shouldn't break any unwind data invariants. I'll remove the TODO.

amanasifkhalid · 2024-09-19T17:52:38Z

I wonder if some more verbose comments about the possible scenario, as described #106379 (comment) would be helpful.

Thanks for digging into this example; I've added some comments describing the scenarios that could hit the fix.

Fix emitSplit logic

d40e42f

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 9, 2024

dotnet-policy-service bot assigned amanasifkhalid Sep 9, 2024

amanasifkhalid requested a review from a team September 10, 2024 20:48

BruceForstall self-requested a review September 10, 2024 20:50

BruceForstall reviewed Sep 19, 2024

View reviewed changes

amanasifkhalid added 2 commits September 19, 2024 13:25

Remove TODO

0fbeb49

Add more comments

e6f57af

BruceForstall approved these changes Sep 19, 2024

View reviewed changes

amanasifkhalid merged commit 8756775 into dotnet:main Sep 20, 2024
108 checks passed

amanasifkhalid deleted the fix-emit-split branch September 20, 2024 03:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: Fix `emitter::emitSplit` to not create zero-sized method fragment #107568

JIT: Fix `emitter::emitSplit` to not create zero-sized method fragment #107568

amanasifkhalid commented Sep 9, 2024

amanasifkhalid commented Sep 9, 2024

BruceForstall commented Sep 10, 2024

amanasifkhalid commented Sep 10, 2024 •

edited

Loading

BruceForstall commented Sep 12, 2024

amanasifkhalid commented Sep 13, 2024

amanasifkhalid commented Sep 13, 2024

BruceForstall left a comment

BruceForstall Sep 19, 2024

amanasifkhalid Sep 19, 2024

amanasifkhalid commented Sep 19, 2024

JIT: Fix emitter::emitSplit to not create zero-sized method fragment #107568

JIT: Fix emitter::emitSplit to not create zero-sized method fragment #107568

Conversation

amanasifkhalid commented Sep 9, 2024

amanasifkhalid commented Sep 9, 2024

BruceForstall commented Sep 10, 2024

amanasifkhalid commented Sep 10, 2024 • edited Loading

BruceForstall commented Sep 12, 2024

amanasifkhalid commented Sep 13, 2024

amanasifkhalid commented Sep 13, 2024

BruceForstall left a comment

Choose a reason for hiding this comment

BruceForstall Sep 19, 2024

Choose a reason for hiding this comment

amanasifkhalid Sep 19, 2024

Choose a reason for hiding this comment

amanasifkhalid commented Sep 19, 2024

JIT: Fix `emitter::emitSplit` to not create zero-sized method fragment #107568

JIT: Fix `emitter::emitSplit` to not create zero-sized method fragment #107568

amanasifkhalid commented Sep 10, 2024 •

edited

Loading